Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Ontologies are critical for organizing and interpreting complex domain-specific knowledge, with applications in data integration, functional prediction, and knowledge discovery. As the manual curation of ontology annotations becomes increasingly infeasible due to the exponential growth of biomedical and genomic data, natural language processing (NLP)-based systems have emerged as scalable alternatives. Evaluating these systems requires robust semantic similarity metrics that account for hierarchical and partially correct relationships often present in ontology annotations. This study explores the integration of graph-based and language-based embeddings to enhance the performance of semantic similarity metrics. Combining embeddings generated via Node2Vec and large language models (LLMs) with traditional semantic similarity metrics, we demonstrate that hybrid approaches effectively capture both structural and semantic relationships within ontologies. Our results show that combined similarity metrics outperform individual metrics, achieving high accuracy in distinguishing child–parent pairs from random pairs. This work underscores the importance of robust semantic similarity metrics for evaluating and optimizing NLP-based ontology annotation systems. Future research should explore the real-time integration of these metrics and advanced neural architectures to further enhance scalability and accuracy, advancing ontology-driven analyses in biomedical research and beyond.more » « lessFree, publicly-accessible full text available March 1, 2026
- 
            Free, publicly-accessible full text available January 1, 2026
- 
            Abstract There is a growing interest in using social media content for Natural Language Processing applications. However, it is not easy to computationally identify the most relevant set of tweets related to any specific event. Challenging semantics coupled with different ways for using natural language in social media make it difficult for retrieving the most relevant set of data from any social media outlet. This paper seeks to demonstrate a way to present the changing semantics of Twitter within the context of a crisis event, specifically tweets during Hurricane Irma. These methods can be used to identify the most relevant corpus of text for analysis in relevance to a specific incident such as a hurricane. Using an implementation of the Word2Vec method of Neural Network training mechanisms to create Word Embeddings, this paper will: discuss how the relative meaning of words changes as events unfold; present a mechanism for scoring tweets based upon dynamic, relative context relatedness; and show that similarity between words is not necessarily static. We present different methods for training the vector model in Word2Vec for identification of the most relevant tweets for any search query. The impact of tuning parameters such as Word Window Size, Minimum Word Frequency, Hidden Layer Dimensionality, and Negative Sampling on model performance was explored. The window containing the local maximum for AU_ROC for each parameter serves as a guide for other studies using the methods presented here for social media data analysis.more » « less
- 
            Abstract BackgroundAnnotating scientific literature with ontology concepts is a critical task in biology and several other domains for knowledge discovery. Ontology based annotations can power large-scale comparative analyses in a wide range of applications ranging from evolutionary phenotypes to rare human diseases to the study of protein functions. Computational methods that can tag scientific text with ontology terms have included lexical/syntactic methods, traditional machine learning, and most recently, deep learning. ResultsHere, we present state of the art deep learning architectures based on Gated Recurrent Units for annotating text with ontology concepts. We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for training and testing. We explore a number of additional information sources including NCBI’s BioThesauraus and Unified Medical Language System (UMLS) to augment information from CRAFT for increasing prediction accuracy. Our best model results in a 0.84 F1 and semantic similarity. ConclusionThe results shown here underscore the impact for using deep learning architectures for automatically recognizing ontology concepts from literature. The augmentation of the models with biological information beyond that present in the gold standard corpus shows a distinct improvement in prediction accuracy.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available